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Abstract In two randomly selected human genomes, 99.9% 
of the DNA sequence is identical. The remaining 0.1% of 
DN A contains sequence variations. The most common type 
of such variation is called a single-nucleotide polymor- 
phism, or SNP. SNPs are highly abundant, stable, and 
distributed throughout the genome. These variations are 
associated with diversity in the population, individuality, 
susceptibility to diseases, and individual response to medi- 
cine. Recently, it has been suggested that SNPs can be used 
for homogeneity testing and pharmacogenetic studies and 
to identify and map complex, common diseases such as high 
blood pressure, diabetes, and heart disease. Consistent with 
this proposal is the identification of the patterns of SNPs 
in conditions such as diabetes, schizophrenia, and blood- 
pressure homeostasis. Although these studies have pro- 
vided insight into the nature of human sequence variation, 
il is not known at present whether these variations are truly 
significant toxicologically and pharmacologically. More- 
over, it is possible that most complex, common disorders 
are caused by the combined effects of multigenes and 
nongenetic environmental factors (multifactorial). There- 
fore, il is likely that sequence variation alone is not suffi- 
cient to predict the risk of disease susceptibiUty, particularly 
in homeostatic organisms like humans. Nevertheless, these 
variants may provide a starting point for further inquiry. 
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Introduction 

It is now clear that mapping the entire human genome 
sequence is a reahty and will be completed by the year 2003. 
Many investigators now have to decide what to do with this 
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information and how to take advantage of this effort. The 
majority of them beheve that one possible benefit of this 
information i i to use it to understand the genetic basis of the 
most commcn familial traits, evolutionary processes, and 
complex and common diseases such as hypertension, diabe- 
tes, obesity, and psychiatric disorders. These common dis- 
eases are likely to be caused by multiple genes and multiple 
nongenetic factors (environmental factors), each contribut- 
ing a modest effect. Their cumulative effect results in the 
condition or trait. The traditional methods of identifying 
disease-related genes are not readily applicable to the 
detection of genes responsible for these multifactorial 
diseases. The availability of the entire genome sequence, 
therefore, may speed up the gene-hunting efforts in the 
near future, but what approaches are to be taken to accom- 
plish this task and what are their limitations? 



The human ;;enome and the discoveiy of single- 
nucleotide polymorphisms (SNPs) as genetic markei^ 

In two randomly selected human genomes, 99.9% of the 
DNA sequence is identical. The remaining 0.1% is thought 
to include s(»me differences or variations in the genome 
between individuals. This variation, called polymorphism, 
arises becau: e of mutations. The simplest form of these 
variations is the substitution of one single nucleotide for 
another (Fig lA), termed SNP. SNPs are more common 
than other ty )es of polymorphisms and occur at a frequency 
of approximately 1 in 1000 base pairs (Brookes 1999) 
throughout :he genome (promoter region, coding se- 
quences, and intronic sequences). These simple changes in 
DNA sequence, most of which are probably located in 
intergenic spacers, are believed to be stable and not delete- 
rious to organisms. SNPs that do not change encoded amino 
acids are call id synonymous and are not subject to natural 
selection (Kimura 1983, snp.cshl.org). On the other hand, 
nonsynonym ous SNPs alter amino acids and might be sub- 
ject to natural selection. SNPs can be observed between 
individuals in a population, may influence promoter activity 
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Fig. 1. a Single-nucleotide polymorphisms (SNPs) are single-base 
variations among different people. Figure shov^^s strings of nucleotides 
ai which individuals A and B differ by just one base. Individual A may 
respond to the drug positively, whereas individual B may show an 
adverse reaction or moderate response or no response at all. b A long 
stretch of DN A (e.g., 100000 bases) with a distinctive pattern of SNPs 
at a given location of a chromosome is called haplotype. Haplotype 
diversity may be generated by new SNP alleles, which can arise because 
of mutations at different loci 



or DN A and pre-mRNA conformation, and play a direct or 
indirect role in phenotypic expression (Krawezak et al. 
1992; Lohrer and Tangen 2000; Pitarque et al. 2001; Spicker 
el al. 2001; LeVan et al. 2001). Because some SNPs are 
functional, comparative studies on identical twins, fraternal 
twins, and siblings suggest that genetic variation is one of 
the factors associated with susceptibility to many common 
diseases (Table 1) as well as every human trait such as 
laltness, curly hair, and individuality (Martin et al. 1997). 
Diversity in the population is also associated with these 
variations. Therefore, it may be possible to understand why 
some individuals are susceptible to common disorders by 
using the human genome sequence and the variations 
between individuals. However, there are limitations, and 
practical and ethical issues must be considered before 
undertaking such analyses (Chanock 2001). 



Mapping and characterization of SNPs 

Because most sequence variants are SNPs, a massive effort 
has been undertaken by several private and public organi- 
zations, such as Celera Genomics, Incyte Genomics, the 
Wellcome Trust Sanger Institute in the United Kingdom, 
and Washington University in the United States, to gener- 
ate a high-density SNP map of the genome (Marth et al. 
2001 ; Irizarry et al. 2000; Altshuler et al.- 2000; Mulhkin et al.. 
2000; also see ihe International SNP Map Working Group 



Table 1. A p; rtial list of disorders associated with SNPs 



Disorder 


Gene 


Reference 


Asthma 


buN 1 and NOS 1 


Immervoll et al. 2001 


rUACj 


Mycocillin 


Colomb et al. 2001 


Systemic sclei osis 


Fibrillin 1 


Tan et al. 2001 


LiUng cancer 


X/TXjTD 1 

MMr-1 


Zhu et al 2001 


rtrrnyinmias 




Kubota et al, 2001 


T/i1^n^t}i t/* Off irifio 

luiupdinic dFi iriiis 


JYLlr 


Donn et al 2001 


Diuuu prcssui c 


1 ATI 


Koschinsky et al 2001 


Biliary cirrhoi is 


MBL 


iVLaisusnita el aj, zUUl 


Type II diabeies 


Syntaxin lA 


Tsunoda et al 2001 


Systemic lupu s 


Prolactin 


Stevens et al 2001 


erythematosus 






Eating disord ;r 


Melanocortin 


Adan and Vink 2001 


Migraine 


Insulin receptor 


McCarthy et al 2001 


Ossification 


Npps 


Koshizuka et al 2002 


Lung cancer 


p53 


Biros et al 2001 


Late-onset PI> 


tau 


Martin et al 2001 



SNPs, Single- lucleotide polymorphisms; POAG, primary open-angle 
glaucoma; Mf tP-1, matrix metalloproteinase 1; EDN 1, endothelin 1; 
NOS 1, neuronal nitric oxide synthetase 1; MBL, mannose binding 
protein; Npps nucleotide pyrophosphatase; TAFl, thrombin activable 
fibrinolysis in libitor, KCNQl, potassium channel protein; MIF, mac- 
rophage inhibitory factor; PD, Parkinson disease 



2001 and http://www.ncbi.nlm.nih.gov/SNP/). As a result of 
this worldwide intensive effort, more than 2.8 miOion SNPs 
have been i Jentified and a high-density map has been con- 
structed in some cases (lida et al. 2001a-d; Iwasaki et al. 
2001; Saito n al. 2001; Osier et al. 2001). Although several 
mapping m nhods (Table 2), such as single-strand confor- 
mational polymorphism (Orita et al. 1989), denaturing 
gradient ge! electrophoresis (DGGE), enzymatic mutation 
detection (^'ouil et al. 1995), microarray or variant detector 
arrays (Wang et al. 1998; Marshall and Hodgson 1998; 
Ramsay 19*)8; Hacia et al. 1999; Hacia and CoUins 1999; 
Dong et al. 2001; Qi et al. 2001; Yoshino et al. 2001), and 
heteroduph x analysis (Lichten and Fox 1983) are available, 
so far none 3f them has supplanted DNA sequencing as the 
method of choice. SNP mapping requires a tremendous 
amount of 1 ime and resources (hundreds and thousands of 
individuals must be studied to ehminate false-positive and 
false-negati/e results). Additionally, many of the methods 
detect only about 80% of mutations. In some methods, 
mutations in GC-rich regions may not be detected (e.g., 
DGGE), and other methods involve expensive instruments 
and kits or loxic chemicals (Table 2). Therefore, new, faster 
methods m ast be developed and old methods refined so 
that, at leas ., cost is not a key consideration. In this respect, 
recently developed high-throughput SNP genotyping 
(Jenkins an d Gibson 2002; McClay et al. 2002) and molecu- 
lar beacon methods (Mhlanga and Malmberg 2001) are 
highly valua ble. According to some estimates, 50% of muta- 
tions are hkely to be in a noncoding sequence, 25% lead to 
amino acid substitution, and 25% are silent (Cargill et al. 
1999; Halujhka et al. 1999). However, it should be noted 
that the abc ve estimates are only for cSNPs (SNPs found in 
protein-coding regions) and SNPs in untranslated regions 
and depend on sample size because they have different 
allele frequ ency distributions. 
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Table 2. A comparison of selected mutation screening methods 


Method 


Fragment 


Advantage 


Dis:idvantage 


Efficiency (%) 




length (bp) 






Single-strand conformational 


-300 


No expensive equipment 


Smi 11 fragments, temperature 


80 


polymorphism 






v.iriation 




1 let c rod up lux 


.300-600 


No expensive equipment 


Cor ditions to be determined 


80 


Denaturing gradient gel 


100-1000 


Simple, long and short 


Gradient gel required. 


100 with GC clamps 


electrophoresis 




fragments 


nr ulation in GC region may 








n >t be detected 




Enzymatic mismatch 


300-1000 


Long and short fragments 


Identifies all kinds of mutations 


100 


detection 










Base excision sequence 


50-1000 


Accurate 


Exf ensive instruments 


100 


scanning 










RNAnse cleavage 


1.6kb 


Longer fragment and fast 


Rec uires special kit 


100 . 


Chtmiical cleavage 


1-2 kb 


Large fragment 


Multistep, labor intensive. 










h izardous chemicals 




DNA sequencing 


500 


Rapid and easy, no 


Labor intensive 


100 






additional sequencing 







SNPs in gene discovery 

Once the map of these SNPs is confirmed, they can be used 
for evolutionary biology studies, gene discovery and map- 
ping, prediction of drug and environmental response, diag- 
nostic tests, heterogeneity testing, and association studies 
(Gray et al. 2000; Schork et al. 2000). For the purpose of 
gene discovery, SNPs are considered to be the most pre- 
dominant segregating form of variation at the molecular 
level because of their frequent occurrence throughout the 
genome, and they can be useful in association studies. How- 
ever, they are less informative, in the sense that humans 
have relatively low nucleotide diversity compared with 
Drosophila and maize,* than another type of marker called a 
microsatellite. Microsatellites are simple sequence repeats, 
the most common classes being dinucleotide, trinucleotide, 
and tetranucleotide, and they occur at a rate of 1 in every 
lOkb in a wide range of eukaryotic genomes. Human 
microsatellites are used for linkage studies and they average 
at least ten alleles with heterozygosity per locus over 80% 
(very high). Although they have a much higher mutation 
rate than the standard sequence, they are not densely dis- 
tributed. Linkage mapping focuses on the small number of 
meiolic events within a family and association between 
marker alleles and traits. It does not require a very dense 
map of markers at the initial stage. On the other hand, 
linkage disequilibrium mapping explores family associa- 
tions and requires a dense map of markers. In association 
studies, the marker is prevalent in patients versus those 
without the disease and this is considered evidence of asso- 
ciation between the disease and the marker (SNP). Al- 
though there are many limitations (described following), 
association studies are perhaps the best for mapping of 
polygenic complex disease loci. However, this type of study 
requires a large number of patients and an adequate control 
group to achieve over 80% power to detect a locus. Once a 
significant site is identified, one can use either a pedigree- 
based transmission disequihbrium test (TDT), which mea- 
sures the transmission of alleles from a heterozygous parent 
to the affected offspring (an unequal transmission of SNP 
alleles to affected and unaffected siblings), or a case-control 



population s impling, which measures the association be- 
tween SNPs and the disease in a large population (Keavney 
et al. 2000; Spiehnan et aL 1992). TDT detects both hnkage 
and association, and SNPs are usually used. These methods 
are based on the assumption that SNP variants account 
for populatic n susceptibility to certain disorders; however, 
it is unknown how many SNPs are needed. These methods 
have several hmitations, such as difficulties associated 
with population structure, different levels of linkage dis- 
equilibrium in loci, allehc and nonallelic heterogeneity of 
phenotypes, and epistatic interaction of alleles, all of which 
have been previously discussed by others (Schork et al. 
2000; Chakravarti 1999; Weiss and Terwilliger 2000). 
An allele fiequency database for human polymorphic 
sites for muluple populations (Osier et al. 2001; Hirakawa 
et al. 2002) can be found on the Kidd lab home page 
(http://info.med,yale.edu/genetics/kkidd/). Ad'ditionally, a 
recently proposed haplotype (a distinct combination of 
single nucleotide types on a single chromosome at a locus) 
map (Fig. IB) of the genome may speed up or simplify the 
hunt for the association between DNA variations and com- 
plex diseases (Helmuth 2001). This is because haplotype 
diversity, which is greater than SNP diversity, may be gener- 
ated by new ;>NP alleles that can arise because of mutation 
at different loci, and hence can be studied by both linkage 
and associati 3n methods. In haplotype association studies, 
multitype genotypes are reduced to haplotypes and this has 
proved to be a more efficient mapping technique than that 
of SNP anal} sis. 

SNPs in phai macology 

Another potential application of SNPs is the development 
of individualized medicine. Inherited genetic differences 
between ind viduals appear to determine each patient's 
response to medicine. For instance, some patients show 
response to ;he prescribed drug without any serious side 
effects, wherjas others do not respond and experience ad- 
verse reactions. In fact, it is estimated that properly pre- 
scribed medi:ations cause 2 million Americans to get sick 
and result in 100000 deaths each year because of adverse 
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drug reactions (Pirmohamed and Park 2001; Bader 2001; 
Wolf et aL 2000; Lazarou et al. 1998). Antitumor agent 6- 
mercaptopurine, for example, has been used for children 
with leukemia. Those children who do not have its metabo- 
lizing enzyme — thiopurine methyltransferase — show a 
severe hematopoietic toxicity, whereas children with the 
more active enzyme require higher doses of the drug. 
Similar anti-drug reactions were also observed in asthma 
and other conditions (Drazen et ah 1999; Roses 2000). The 
most important of the several hundred thousand SNPs in the 
human genome for this purpose are SNPs in drug-metaboliz- 
ing enzymes, particularly in the coding and the promoter 
regions, because changes there may have functional signifi- 
cance (Risch 2000). The SNP method can also be applied to 
crop (corn and soybean) genetics (Rafalski 2002). Haplotype 
analysis is possible for crop genetics because of the availabil- 
ity of many genes and expressed sequence tags, as well as 
their intraspecific nucleotide diversity. 

SNPs and evolution 

SNPs can be used to study DNA sequence variation among 
species. Because such variations are present at all levels of 
evolution, including branching points of speciation, they 
may provide an understanding of how the modern genome 
evolved. Most SNPs are not in protein-coding regions but 
are elsewhere in the genome; therefore, their distribution 
is not under selective pressure. Variations in the protein- 
coding region that affect the phenotype might be subject to 
natural selection, but if these variations were retained in the 
gene over time, then they must have some benefit for the 
individual for successful reproduction. The variants that are 
selected for retention by natural selection may represent an 
important step in evolution. Thus, by calculating a ratio 
between variants in noncoding and coding regions of a 
series of protein families found in different species, it may 
be possible to trace the branching point of an evolutionary 
tree. At this branch point, the variant must have become 
advantageous for the species and hence fixed in the gene 
pool (Osier et al. 2001; Stephens et al. 2001; Liberies 2001). 
While this gene pool has been continuously expanding dur- 
ing evolution, it might have resulted in the modern human 
genome. It is very well known from several studies that 
humans are similar to chimpanzees at the genomic level. 
However, there are differences between humans and chim- 
panzees. For instance, malaria; rheumatoid arthritis; and 
breast, colon, and lung cancers are extremely rare in chim- 
panzees but common in humans. In this regard, SNPs may 
provide important health clues. 



Conclusion 

There is no doubt that the identification of genes underlying 
polygenic and complex diseases such as psychiatric disor- 
ders, diabetes, hypertension, and asthma is of paramount 
interest for clinicians, geneticists, patients, and the public. 



However, i* is not clear at present how genetic variation 
alone determines the susceptibiUty of an individual to some 
diseases or to adverse drug interactions. This is because 
most comm on traits and phenotypes are the result of long- 
term intera<;tion between genetic and nongenetic environ- 
mental faciors. Factors such as hfestyle and diet may 
contribute to disease susceptibility by altering gene expres- 
sion. Frequency of polymorphism may also vary among 
different pc-pulations (Wakeley et al. 2000; Nielsen 2000; 
Nielsen anc Slatkin 2000). Therefore, even after complete 
sequencing and identification of SNPs in an individuaFs 
DNA, it is not simple to associate these variations to disease 
without knc wing the functional significance of the identified 
SNPs. Add tionally, researchers are struggling to under- 
stand disease heterogeneity even in monogenic disorders. 
In multigen c disorders, the contribution of susceptible indi- 
vidual gene i to the disorder is very weak. Therefore, geno- 
type alone is not sufficient to predict or of susceptibility to 
disease (Martin et al. 1997) nor is phenotype variation alone 
necessarily inked to DNA sequence variation (Lander and 
Schork 199^ ; Weeks and Lathrop 1995). On the other hand, 
finding out how SNPs affect an individual's health and 
transforming this knowledge into the development of new 
medicine, \v'hich requires the correlation of SNPs with 
specific dist ases, will revolutionize the treatment of most 
common kil ler diseases. This is because such an understand- 
ing of the relationship between SNPs and the disease will 
allow clinicians to determine whether an individual will 
respond to a medicine or experience serious side effects. 
Drug comp inies will be able to design different drugs for 
each patien : with similar clinical symptoms or disease phe- 
notypes. In future, this knowledge may give clinicians more 
insight into the disease and change the definition of some 
disorders. 
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